F/G  V2 


THE  PROCESSES  INVOLVED 
IN  DESIGNING  SOFTWARE 


saesice 

AppkrmoNs 

INCORPORQ16D 


80  12  12  044 


THE  PROCESSES  INVOLVED 
IN  DESIGNING  SOFTWARE 


Technical  Report 
SAI-80-110-DEN 

August  1980 


Robin  Jeffries 
Althea  T.  Turner 
Peter  G.  Poison 
University  of  Colorado 

Michael  E.  Atwood 
Science  Applications,  Inc. 


Reproduction  in  whole  or  in  part  is  permitted  for 
any  purpose  of  the  United  States  Government. 


This  research  was  sponsored  by  the  Personnel  and  Training  Research 
Programs,  Psychological  Sciences  Division,  Office  of  Naval  Research, 
under  Contract  No.  N00014-78-C-0165,  Contract  Authority  Identification 
Number,  NR157-414. 


Approved  for  public  release;  distribution  unlimited. 


Science  Applications,  Inc. 


40  Denver  Technological  Cenier  West.  7935  East  Prentice  Avenue,  Englewood,  Colorado  80111,  303/773-6900 


Other  SAI  Office*:  Albuquerque.  Ann  Arbor,  Arlington,  Atlanta.  Boston,  Chicago,  Huntsville,  La  Jolla.  Los  Angeles.  McLean.  Palo  Alto,  Santa  Barbara.  Sunnyvale,  and  Tucson. 


V\ 


_ Unclassified _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Dalu^Enlered) 

REPORT  DOCUMENTATION  PAGE  before^completing^form** 

I.  REPORT  NUMBER  |2.  GOVT  ACCESSION  NO.  3-  RECIPIENT’S  CAT ALOG  NUMBER 


I  4.  T|TlE  (and  Subtitfe)^_  . 


5.  TYPE  Q£_REPO«T.JUgCainr)_  COVERED 


The  Processes  Involved  in  Designing  Software,  ^(Technical 


7.  AUTH^ifi^^- -  - - 

rp/ffiTchael  E./Atwoot  f  Robin/Jeffries 
OJ  Althea  A. /Turner *vand  PeteivA: /Poison 


-6.  P£AF.U«M4MC  OZGu  REPORT  NUMBER 

>)  SAI-8f)-l  10-DEN  /  ^ 

4.  C0HT^We^'0«-a«j[HJlNUM8ERCft) 


NQ00 1 4- 78- C-p 1 65 


9.  PERFORMING  ORGANI  ZATION  NAME  AND  ADDRESS 

Science  Applications,  Inc. 

7935  E.  Prentice  Avenue 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  &  WORK  UNIT  NUMBERS 

-  6U53N^R.042-06 
h  RRQA2 ‘■0,6*02  y  NR1 57-414 


111-  CONTROLLING  OFFICE  NAME  AND  ADDRESS 


Personnel  &  Training  Research  Programs 
Office  of  Naval  Research  f]j\  ft  / 

Avl  inntrm  \/ A  99917  (  J 


12.  REPORT  DATE 

August  198,0 

13.  NUMBER  OF  PAGES 

53 


'  14.  MONITORING  AGENCY  NAME  &  ADORESS(fir''d//7erenf  /ror>i  Controlling  Office)  I  >5.  SECURITY  CLASS,  (of  this  report) 


lb  }■■#*/::  f 


Unclassi fied 


\5m.  OECLASSIFICATION/DOWNGRADING 
SCHEDULE 


[16.  DISTRIBUTION  STATEMENT  (of  this  Report) 


Approved  for  public  release;  distribution  unlimited 


I  17.  DISTRIBUTION  STATEMENT  (of  the  abstract  entered  in  Block  20,  if  different  from  Report) 


16.  SUPPLEMENTARY  NOTES 


19-  KEY  WORDS  (Continue  on  reverse  aide  if  necessary  and  identify  by  block  number) 

Problem  solving,  planning,  computer  programming 


20.  ABSTRACT  (Continue  on  reverse  aide  If  necessary  end  identify  by  block  number) 

•A  design  task  involves  a  complex  set  of  processes.  Starting  from  a  global 
statement  of  the  problem,  a  designer  must  develop  a  precise  plan  for  a 
solution  that  can  be  realized  in  some  concrete  way.  Software  design,  which 
is  investigated  in  this  paper,  is  the  process  of  translating  a  set  of  task 
requirements  into  a  structural  description  of  a  computer  program  that  will 
perform  the  task.  Through  experience,  designers  acquire  knowledge 
concerning  the  overall  structure  of  a  good  design  and  of  the  processes  -(cont.) 


DD  ,«r»  1473  EDITION  OF  1  NOV  65  IS  OBSOLETE 


Unclassified 


f  /  & 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (When  Dele  Enle red 


_ Unclassified _ 

SECURITY  CLASSIFICATION  or  THIS  PAGEflWlan  Data  Bntmnd) 

-  of  generating  one.  Using  this  knowledge,  they  direct  their  actions  to 
insure  that  their  designs  will  satisfy  these  constraints.  We  call  this 
abstract  knowledge  about  designs  and  design  processes,  along  with  a  set 
of  procedures  which  implement  these  processes,  a  ''design  schema".  This 
paper  describes  the  design  schema  of  experienced  software  designers  and 
illustrates  its  operation  by  considering  thinking-aloud  protocols 
collected  from  both  expert  and  novice  designers.^ 


Unclassified 


SECURITY  CLASSIFICATION  OR  PAGEOWimi  Da  it  Bnitnd) 


THE  PROCESSES  INVOLVED  IN  DESIGNING  SOFTWARE 
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and 

Michael  E,  Atwood 
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The  task  of  design  Involves  a  complex  set  of  processes.  Starting  from  a 
global  statement  of  a  problem,  a  designer  must  develop  a  precise  plan  for  a 
solution  that  will  be  realized  in  seme  concrete  way,  e.g.,  as  a  building  or  as 
a  computer  program.  Potential  solutions  are  constrained  by  the  need  to 
eventually  map  this  plan  Into  a  real- world  Instantiation.  For  anything  more 
than  the  most  artificial  examples,  design  tasks  are  too  complex  to  be  solved 
directly.  Thus,  an  important  facet  of  designing  Is  decomposing  a  problem  Into 
more  manageable  subunits.  Design  of  computer  systems,  software  design.  Is  the 
particular  design  task  to  be  focused  on  In  this  paper. 

Software  design  Is  the  process  of  translating  a  set  of  task  requirements 
(functional  specifications)  Into  a  structured  description  of  a  computer 
program  that  will  perform  the  task.  There  are  three  major  elements  of  this 
description.  First,  the  specifications  are  decomposed  Into  a  collection  of 
modules,  each  of  which  satisfies  part  of  the  problem  requirements.  This  Is 
often  referred  to  as  a  modular  decomposition.  Second,  the  designer  must 
specify  the  relationships  and  Interactions  among  the  modules.  This  Includes 
the  control  structures,  which  Indicate  the  order  In  which  modules  are 


includes  the  data  structures  Involved  In  the  solution.  One  can  think  of  the 


original  goal-oriented  specifications  as  defining  the  properties  that  the 
solution  must  have.  The  design  Identifies  the  modules  that  can  satisfy  these 
properties.  How  these  modules  are  to  be  Implemented  Is  a  programming  task, 
which  follows  the  design  task. 

This  paper  presents  a  theory  of  the  global  processes  that  experts  use  to 
control  the  development  of  a  software  design.  After  a  review  of  some  relevant 
literature,  the  theory  Is  described  In  detail.  Thinking  aloud  protocols 
collected  from  both  expert  and  novice  designers  on  a  moderately  complex 
problem  provide  evidence  for  these  theoretical  ideas.  Finally,  we  speculate 
on  how  such  processes  might  be  learned. 

RESEARCH  ON  DESIGN  AND  PLANNING 

While  there  has  been  little  research  which  focuses  directly  on  problem 
solving  processes  In  software  design,  there  are  a  number  of  research  areas 
which  are  peripherally  related.  The  first  of  these,  formal  software  design 
methodologies.  Is  Indicative  of  the  guidelines  which  experts  In  the  field 
propose  to  structure  the  task  of  designing.  The  second  area,  automatic 
programming,  provides  a  detailed  analysis  of  the  task  from  an  artificial 
Intelligence  viewpoint.  Finally,  research  on  planning  and  design  gives 
Insight  Into  planning  processes  which  may  be  general  across  domains. 
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Software  Desfgn  Methodologies 

There  are  two  reasons  for  considering  the  professional  literature  in  this 
field.  A  reasonable  model  of  performance  In  any  domain  ought  to  relate  to 
accepted  standards  of  good  practice  In  that  domain  (Kintsch,  1980).  These 
formal ized  methods  were  written  by  experts  In  the  area  trying  to  convey  to 
others  the  procedures  they  use  to  perform  the  task.  In  addition,  most  expert 
designers  are  familiar  with  this  literature  and  may  Incorporate  facets  of 
these  methodologies  Into  their  designs. 

Software  design  involves  generating  a  modular  decomposition  of  a  problem 
that  satisfies  the  requirements  described  in  Its  specifications.  Design 
methods  provide  different  bases  for  performing  modular  decompositions.  There 
are  two  prevailing  views  In  the  literature  as  to  what  this  basis  should  be. 
Both  positions  prescribe  problem  reduction  approaches  to  the  design  process. 
One  focuses  on  data  structures  and  the  other  on  data  flow.  The  various 
methodologies  differ  In  the  nature  and  specificity  of  the  problem  reduction  or 
decomposition  operators  and  of  the  evaluation  functions  for  determining  the 
adequacy  of  alternative  decompositions. 

With  the  data-structure  oriented  approaches  (e.g.,  Jackson,  1975; 

Warnler,  1974),  a  designer  begins  by  specifying  the  input  and  output  data 
structures  according  to  certain  guidelines.  A  modular  decomposition  of  a 
problem  is  Identified  by  deriving  the  mapping  between  the  input  and  output 
data  structures.  Because  such  methods  Involve  the  derivation  of  a  single 
"correct”  decomposition,  there  Is  no  need  for  evaluation  criteria  or  the 
comparison  of  alternative  decompositions. 
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Data-flow  oriented  approaches  (Yourdon  and  Constantine,  1975;  Myers, 

1975)  are  a  collection  of  guidelines  for  identifying  trial  decompositions  of  a 
problem.  Thus,  these  methods  are  more  subjective,  allowing  a  designer  to 
exercise  more  judgment.  As  a  result,  numerous  heuristics  for  evaluating 
potential  decompositions  are  used  with  these  methods.  Examples  of  such 
evaluation  guidelines  Include:  maximizing  the  Independence  and  cohesion  of 
individual  modules,  providing  a  simple  (as  opposed  to  general)  solution  to  tne 
current  subproblem,  etc.  These  guidelines  control  the  evaluation  of  possible 
solutions  to  a  design  problem  and  the  generation  of  new  alternative  designs. 

Most  formal  software  design  methodologies  require  that  the  design  proceed 
through  several  Iterations.  Each  iteration  is  a  representation  of  the  problem 
at  a  more  detailed  level.  That  Is,  the  initial  decomposition  is  a  schematic 
description  of  the  solution.  This  becomes  more  detailed  In  the  subsequent 
Iterations.  In  general,  this  mode  of  decomposing  the  problem  leads  to  a  top- 
down,  breadth-first  expansion  of  a  design. 

There  are  competing  views  that  prescribe  different  modes  of  expansion. 
Some  of  these  are  characterized  by  such  terms  as  "bottom-up",  "middle-out",  or 
"Inside-out"  (Boehm,  1975).  Such  positions  have  been  developed  In  response  to 
what  some  Individuals  feel  are  unsatisfactory  properties  of  a  top-down 
expansion.  There  are  problems  In  which  it  is  necessary  to  understand  certain 
crucial  lower-level  functions  In  order  to  Identify  high  level  constraints  on 
the  design.  These  alternative  modes  of  expansion  may  be  used  by  a  designer  In 
problems  for  which  an  Initial  decomposition  Is  difficult  to  derive.  There  are 
undoubtedly  problems  for  which  each  of  these  methodologies  Is  particularly 


suited.  However,  the  formal  literature  on  software  design  lacks  a  mapping 
between  types  of  problems  and  the  appropriate  design  methodology. 

Automatic  Programming  Systems 

Another  source  of  information  about  the  task  of  software  design  comes 
from  automatic  programming  systems.  The  term  "automatic  programming"  has  been 
used  to  refer  to  activities  ranging  from  the  design  and  development  of 
algebraic  compilers  to  systems  that  can  write  a  program  from  information  given 
In  the  form  of  goal -or tented  specifications  (Blermann,  1976;  Heidorn,  1976). 
The  latter  represent  attempts  to  specify  the  procedures  of  software  design  in 
a  mechanizable  form. 

Simon's  (1963,  1972)  Heuristic  Compiler  was  one  of  the  earl  lest  proposals 
for  a  programming  system  that  generated  code  from  abstract  specifications. 

This  program's  task  was  to  generate  IPL-V  code  for  subroutines  that  were 
components  of  some  larger  program.  It  was  implicitly  assumed  that  the 
original  specifications  had  been  decomposed  Into  detailed  functional 
descriptions  for  a  collection  of  routines  that  would  make  up  the  complete 
program. 

The  definitions  of  routines  to  be  generated  by  the  Heuristic  Compiler 
could  take  one  of  two  forms,  with  each  form  being  handled  by  a  separate 
special  compiler.  The  first  form  involved  a  before  and  after  description  of 
the  states  of  certain  cells  in  the  I  PL  system.  The  specification  described  the 
inputs  and  outputs  of  a  routine.  The  state  description  compiler's  task  was  to 
derive  the  sequence  of  IPL  instructions  that  brought  about  that 
transformation.  The  other  form  of  definitions  was  In  terms  of  Imperative 
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statements  describing  the  function  to  be  performed  by  a  given  subroutine, 
which  was  handled  by  the  function  compiler.  Both  specialized  compilers  used 
suitably  generalized  forms  of  means-ends  analysis  to  generate  sequences  of  IPL 
Instructions  that  would  meet  the  Input  specifications. 

One  branch  of  current  research  on  automatic  programming  can  be  viewed  as 
attempts  to  generalize  the  Ideas  that  were  originally  contained  In  the  state 
description  compiler.  Biermann  (1976)  describes  several  automatic  programming 
systems  that  derive  programs  from  examples  of  Input-output  behavior  for  a 
routine  or  from  formal  descriptions  of  Inputs  and  outputs.  Note  that  the  data- 
structure  oriented  software  design  methodologies  discussed  above  resemble 
these  systems  In  their  focus  on  deriving  detailed  actions  from  inputs  and 
outputs. 

Other  automatic  programming  systems  have  been  developed  that  generate 
routines  from  Information  supplied  through  a  natural  language  dialogue  with 
the  user  (Heldorn,  1976).  These  efforts  can  be  viewed  as  general izatlons  of 
the  function  compiler.  Such  systems  consist  of  four  components  (Green,  1977; 
Heldorn,  1976;  Balzer,  1973).  First,  the  system  acquires  a  description  of  the 
problem  to  be  solved,  frequently  via  Interactions  with  a  relatively  naive 
user.  Second,  this  Information  Is  synthesized  Into  a  coherent  description  of 
the  program  to  be  written  (Green,  1977).  This  description  Is  then  verified, 
and  additional  information.  If  necessary.  Is  acquired  through  further 
Interactions  with  the  user  (Balzer,  Goldman,  and  Wile,  1977).  Finally,  the 
refined  description  Is  used  as  Input  to  a  subsystem  that  synthesizes  the 
program  In  the  high  level  language,  making  decisions  about  data  structures. 


algorithms,  and  control  structures.  Much  of  the  current  work  In  automatic 
programming  focuses  on  the  last  of  these  components. 

Balzer  and  his  colleagues  have  considered  the  task  of  transforming  an 
Informal  natural  language  specification  of  a  program  into  a  formal  description 
of  a  design.  This  design  would  then  be  input  into  a  code  generation 
subsystem.  There  are  two  aspects  of  Balzer's  work  that  are  relevant  here. 
First,  he  attempts  to  develop  techniques  that  enable  one  to  carry  out  the 

t 

Initial  phases  of  the  design  effort.  Incomplete  goal-oriented  specifications 
are  first  translated  into  abstract.  Incomplete  functional  specifications  and 
then  refined  into  a  complete  set  of  formal  specifications  for  the  program. 
Second,  the  knowledge  used  by  Balzer’s  system  is  domain  Independent.  This 
system  can  be  contrasted  with  the  programs  of  Long  (1977)  and  Mark  (1976) 
which  are  strongly  domain  dependent,  and  where  design  problems  are  proposed  In 
a  slngl e  micro-wor I  a. 

A  system  that  Is  designed  to  deal  with  the  problems  of  detailed  design 
and  code  generation  Is  a  program  called  PECOS  (Barstow,  1977,  1979).  PECOS 
generates  LISP  code  from  a  high  level  description  of  input  and  output  data 
structures  and  the  algorithms  to  be  used  to  solve  the  problem.  A 
distinguishing  feature  of  PECOS  is  that  the  program  uses  a  collection  of 
rules.  It  encodes  both  general  knowledge  and  specific  information  about  LISP 
to  guide  its  problem  solving  efforts,  rather  than  using  a  uniform  strategy 
like  means-ends  analysis. 

PECOS’  knowledge  base  Is  In  the  form  of  a  large  set  of  rules.  General 
rules  deal  with  representation  techniques  for  collections,  enumeration 
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techniques  for  collections,  and  representation  techniques  for  mapping.  Each 
of  these  subsets  of  rules  can  be  organized  Into  a  hierarchical  structure  with 
a  number  of  Intermediate  levels  between  the  most  abstract  concepts  (e.g. 
collection)  and  Information  about  specific  procedures  or  data  structures  (e.g. 
1 1 nked  free  cel  Is) . 

PECOS  employs  problem  solving  mechanisms  that  Iteratively  refine  each 
component  of  the  spec  if (cations.  A  partially  refined  subproblem  Is  selected, 
and  then  a  rule  (s  applied  to  it.  Each  rule  application  can  produce  one  of 
three  outcomes.  First,  the  subproblem  can  be  refined  to  the  next  lower  level 
of  detail.  Second,  crucial  properties  of  some  component  of  the  subproblem  can 
be  Identified  and  included  In  the  description.  Third,  additional  information 
about  the  subproblem  can  be  gathered. 

This  review  of  automatic  programming  demonstrates  that  there  are  two 
components  to  the  task  of  software  design.  The  first  is  the  translation  of  the 
initial  goal-oriented  specifications  into  a  high-level  functional 
decomposition  of  the  original  problem.  This  incomplete,  abstract  description 
of  the  problem  must  then  be  refined  into  a  set  of  formal  specifications  that 
precisely  define  data  structures,  control  structures,  and  the  functions 
performed  by  various  modules  In  the  program.  The  second  stage  of  the  design 
process  involves  a  collection  of  implementation  decisions.  These  decisions 
specify  data  structures  and  algorithms  that  satisfy  tne  functional 
descriptions  and  efficiency  criteria.  The  first  phase  requires  powerful 
problem  solving  strategies  that  can  factor  the  original  problem  into  a 
collection  of  subproblems.  It  also  requires  the  generation  of  successive 
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refinements  of  each  subproblem.  Incorporating  more  and  more  detail  about  the 
developing  solution. 

Models  of  Planning  and  Design 

There  exist  two  problem  solving  systems  (Sacerdotl,  1975;  Hayes-Roth  and 
Hayes-Roth,  1979)  which  contain  mechanisms  that  seem  adequate  to  carry  out  the 
processes  required  In  the  Initial  phase  of  the  design  process.  Both  of  these 
systems  generate  a  plan  of  action. 

Sacerdotl's  (1975)  NOAH  solves  robot  planning  problems  by  a  process  of 
successive  refinement.  Sacerdotl  assumes  that  the  knowledge  necessary  to 
generate  a  plan  Is  organized  in  a  collection  of  knowledge  structures,  each  of 
which  contains  the  specification  of  seme  subgoal  and  the  actions  necessary  to 
accomplish  that  subgoal.  Each  unit  of  knowledge  has  the  Information  necessary 
to  take  one  element  of  a  developing  plan  and  produce  Its  next  more  detailed 
refinement.  Sacerdotl  assumes  that  the  complete  plan  Is  generated 
Iteratively.  At  any  stage  of  the  planning  process,  each  segment  of  the  plan 
Is  expanded  to  Its  next  level  of  refinement.  Then  generalized  problem  solving 
processes  called  critics  are  used  to  reorganize  this  more  detailed  plan  Into 
an  internally  consistent  and  efficient  sequence  of  actions.  The  process 
repeats  itself  at  the  next  level,  terminating  with  a  plan  whose  Individual 
steps  can  be  executed  to  solve  the  Initial  problem. 

Hayes-Roth  and  Hayes-Roth  (1979)  describe  a  HEARSAY-1  Ike  system  which 
plans  routes  for  performing  a  collection  of  everyday  errands.  Knowledge  about 
the  planning  of  errands  Is  organized  Into  a  col  lection  of  pattern-directed 
modules,  called  specialists,  that  communicate  through  a  global  data  structure 


called  the  blackboard.  The  behavior  of  this  system  is  opportunistic  in  the 
sense  that  data  currently  on  the  blackboard  can  trigger  a  specialist  that 
makes  a  decision  at  some  arbitrary  level  of  abstraction  In  the  developing 
plan. 

Hayes-Roth  and  Hayes-Roth  point  out  that  a  system  like  NOAH  is  quite 
rigid,  in  that  It  is  restricted  to  a  purely  top-down,  breadth-first  expansion 
of  a  solution.  Their  system,  in  contrast,  ts  capable  of  making  a  best  or  most 
useful  decision  at  any  level  of  abstraction;  is  capable  of  Incremental  or 
partial  planning;  and  can  adopt  different  planning  methods  depending  upon  the 
specifics  of  a  given  problem. 

Many  of  Hayes-Roth  and  Hayes-Roth's  criticisms  concerning  the  rigidity  of 
a  program  like  NOAH  are  well  taken.  On  the  other  hand,  many  of  the  phenomena 
that  they  have  observed  In  their  protocols  may  be  due  to  the  task  and  the 
level  of  expertise  of  their  subjects.  None  of  their  subjects  had  extensive 
experience  with  errand  planning  tasks.  It  may  be  the  case  that  one  would 
observe  quite  different  behavior  in  an  environment  that  required  the  solution 
of  a  large  number  of  subproblems  and  the  Integration  of  these  solutions.  One 
might  also  expect  more  orderly  kinds  of  behavior  In  situations  where 
successful  performance  required  the  Integration  and  utilization  of  a  large, 
well-organized  body  of  relevant  knowledge. 

There  has  been  a  limited  amount  of  research  on  the  process  of  design  or 
on  problems  that  are  difficult  enough  to  require  the  construction  of  an 
elaborate  plan.  Much  of  the  work  on  expert  problem  solving  In  thermodynamics 
(Bhaskar  and  Simon,  1977),  physics  (Larkin,  1977),  and  other  semantically  rich 


10 


_ — 


A  \\ 


domains  Is  not  directly  relevant  to  processes  Involved  in  solving  design 
problems,  since  these  studies  all  use  problems  that  can  be  solved  by  a  single, 
well  understood  problem  method,  or  schema.  An  expert  In  these  domains  first 
has  to  identify  the  relevant  schema  and  then  apply  the  schema  to  the  problem. 
In  contrast,  the  major  task  In  design  Is  the  reduction  of  the  original  problem 
Into  a  collection  of  subproblems. 

Levin  (1976)  has  attempted  to  develop  a  theory  of  software  design 
processes  that  Is  consistent  with  current  thinking  on  the  structure  of  the 
human  Information  processing  system  and  known  problem  solving  methods.  Levin 
postulates  that  design  can  be  viewed  as  Involving  three  fundamental  processes- 
-"selectlng  problems  to  work  on,  gathering  Information  needed  for  the 
solution,  and  generating  solutions”, (  Levin,  1976,  p.  2).  Levin  argues  that 
the  problem  selection  process  Is  control  led  by  a  set  of  global  strategies  and 
local  Information  about  constraints  that  are  directly  relevant  to  the  current 
subproblem.  He  developed  a  simulation  model  that  takes  as  Input  the  protocol 
of  an  expert  designer  working  on  a  fairly  difficult  problem  and  produces  a 
list  of  subgoals  generated  by  that  designer  during  the  process  of  solving  the 
pr  oh  I em . 

Simon  (1973)  sketches  out  a  theory  of  psychological  processes  Involved  a 
design  task  In  the  context  of  discussing  the  distinction  between  well 
structured  and  III  structured  problems. 


The  whole  CarchitecturalJ  design  then,  begins  to 
acquire  structure  by  being  decomposed  Into  various 
problems  of  component  design,  and  by  evoking,  as  the 
design  progresses,  all  kinds  of  requirements  to  be  applied 
In  testing  the  design  of  its  components.  During  any  given 
short  period  of  time,  the  architect  will  find  himself 
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working  on  a  problem  which,  perhaps  being  In  an  III 
structured  state,  soon  converts  Itself  through  evocation 
from  memory  Into  a  well  structured  problem  (Simon,  1973, 
p.  190). 

Simon's  view  of  the  design  process  Is  that  the  original  design  problem  is 
decomposed  Into  a  collection  of  well  structured  subproblems  under  the  control 
of  some  type  of  executive  process  that  carries  out  the  necessary  coordination 
functions.  Also  note  that  Information  retrieved  from  long  term  memory  Is 
Incorporated  Into  the  developing  solution;  It  is  this  additional  Information 
that  converts  the  original  III  structured  problem  Into  a  collection  of  well 
structured  problems. 

Much  of  the  work  discussed  above  focuses  on  the  decomposition  of  complex 
tasks  Into  more  manageable  subtasks.  Our  interpretation  of  the  literature  on 
software  design  is  that  this  decomposl tlonal  process  Is  central  to  the  task. 
Moreover,  we  believe  that  the  mastery  of  decomposition  should  be  what 
differentiates  experts  from  novices.  The  Iheory  to  be  presented  next  Is  built 
on  the  process  of  decomposition  and  Its  associated  control  strategies. 

A  THEORY  OF  PROBLEM  SOLVING  IN  SOFTWARE  DESIGN 

The  following  is  an  outline  of  a  theory  of  processes  Involved  In  solving 
a  software  design  problem.  The  successful  performance  of  this  task  Involves 
the  coordination  of  a  complex  set  of  processes.  Some  apply  abstract  knowledge 
about  the  task.  Others  retrieve  computer  science  knowledge  or  information 
about  the  design  problem  or  are  Involved  In  the  storage  of  relevant 
Information  for  later  use  In  solving  problems.  The  focus  of  this  discussion 


will  be  on  the  global  structure  of  the  design  task,  particularly  tts  guiding 
control  processes,  and  on  the  manipulation  of  knowledge  within  the  problem 
solving  effort. 

Experts  have  knowledge  concerning  the  overal I  structure  of  a  good  design 
and  of  the  process  of  generating  one.  Using  this  knowledge,  they  direct  their 
actions  to  Insure  that  their  designs  will  satisfy  these  structural 
constraints.  This  Implies  that  skilled  designers  have  knowledge  describing 
the  structure  of  a  design  independent  of  its  content.  This  abstract  knowledge 
about  design  and  design  processes,  along  with  the  set  of  procedures  which 
Implement  these  processes,  will  be  referred  to  as  the  "design  schema."  This 
schema,  which  develops  through  experience  with  software  design,  enables 
efficient  management  of  a  designer’s  resources  In  doing  this  particular 
special Ized  and  complex  task.  We  propose  that  the  generation  of  a  design  Is 
controlled  by  the  Interaction  between  the  design  schema  and  the  more  specific 
knowledge  that  describes  how  to  accomplish  particular  goals. 

A  schema  Is  a  higher  order  knowledge  structure  which  governs  behavior  In 
a  particular  domain  or  activity,  providing  a  broad  abstract  structure  onto 
which  an  exemplar  Is  to  be  mapped.  These  knowledge  structures  specify 
principal  elements  of  a  given  domain  and  Include  mechanisms  which  drive  the 
generation  process  and  which  lead  to  outcomes  which  are  structured  according 
to  conventions  shared  by  expert  members  In  a  discipline.  A  schema  can  be  used 
to  organize  complex  material  Into  constituents  and  may  be  applied  recursively 
to  break  some  of  these  constituents  down  further.  These  same  structures  also 
guide  the  comprehension  process  by  arranging  incoming  Information  so  that  It 


is  structured  according  to  the  underlying  abstract  schema.  Absence  of  an 
appropriate  schema  can  interfere  with  both  the  initial  comprehension  and 
subsequent  recal I  of  a  text. 

The  design  schema  is  used  in  both  the  generation  and  comprehension  of 
software  designs.  The  design  schema  is  not  tied  to  any  specific  problem 
domain,  but  consists  instead  of  abstract  knowledge  about  the  structure  of  a 
completed  design  and  the  processes  involved  in  the  generation  of  that  design. 
It  accounts  for  the  overal I  structure  of  expert  design  behavior  and  the 
similarities  among  experts.  Of  course,  the  design  schema  will  differ  from 
expert  to  expert,  since  their  experiences  with  software  design  will  not  be 
Identical.  However,  the  overall  nature  of  these  schemata  will  be  similar  for 
most  people.  Therefore,  we  choose  to  simpl Ify  this  discussion  by  referring  to 
a  single,  modal  design  schema. 

The  design  schema  develops  as  a  result  of  experience  with  software 
design.  Originally,  a  designer’s  approach  to  this  task  is  assumed  to  Involve 
general  problem  solving  strategies,  such  as  "divide  and  conquer."  As  an 
individual  has  more  and  more  experience  with  this  activity,  these  general 
strategies  are  transformed  into  a  specialized  schema.  The  schema  is  developed 
through  the  addition  of  domain-specific  concepts,  tactics,  and  evaluative 
criteria.  Whenever  a  designer's  specialized  schema  Is  inadequate  to  solve  a 
problem,  more  general  strategies  take  over. 

The  design  schema  Is  assumed  to  include:  I)  a  collection  of  components 
which  partition  the  given  problem  into  a  set  of  meaningful  tasks,  2) 
components  which  add  elements  to  tasks  which  assure  that  they  will  function 


properly  (e.g.,  initialization  of  data  structures  or  of  loops),  3)  a  set  of 
processes  that  control  the  generation  and/or  comprehension  of  designs,  and  4) 
evaluation  and  generation  procedures  which  ensure  effective  utilization  of 
knowledge.  Each  component  of  the  design  schema  is  composed  of  both  declarative 
and  procedural  knowledge  about  the  abstract  nature  of  the  design  process.  The 
schema  can  be  applied  recursively,  which  leads  to  a  modular  decomposition  of 
the  problem  into  more  and  more  detailed  modules. 

The  schema  can  be  viewed  as  driving  the  generation  of  a  software  design 
by  breaking  up  the  initial  task  into  a  set  of  subproblems.  Knowledge  of  the 
particular  subproblems  that  are  Identified  during  this  decomposition  Interacts 
heavily  with  the  schema.  However,  the  design  schema  Itself  does  not  contain 
knowledge  about  any  particular  class  of  problems.  The  schema  can  be  applied  to 
the  original  problem  or  to  any  subproblem  at  a  lower  level.  The  recursive 
application  of  the  design  schema  to  subproblems  enables  decomposition  of  each 
problem  Into  a  manageable  set  of  tasks. 

How  the  decomposition  proceeds  depends  upon  the  designer,  the  designer's 
experience  and  the  problem  at  hand.  There  are  several  decomposition 
strategies  that  a  designer  can  use  to  guide  the  process.  One  strategy  Is  to 
break  the  problem  into  Input,  process,  and  output  elements.  While  there  are 
other  strategies  that  could  be  used  to  decompose  some  problems,  the  Input- 
process-output  strategy  is  preeminently  used.  In  order  to  keep  this 
discussion  more  concrete,  we  will  describe  decomposition  in  terms  of  this 
prevail Ing  strategy. 

The  Initial  pass  at  decomposition  results  In  a  representation  of  the 
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problem  that  Is  a  simplified  "solution  model"  of  the  system.  That  is,  a  model 
is  devised  specifying  a  set  of  tasks  that  will  solve  the  problem  and  a  control 
structure  for  these  tasks.  It  is  then  expanded  Into  c  set  of  well-defined 
subproblems.  The  solutions  to  these  subproblems  represent  a  solution  to  the 
original  design  problem.  This  process  of  decomposition  is  now  applied  to  each 
subproblem  In  turn,  resulting  In  more  and  more  detailed  plans  of  what  should 
be  done  to  accompl Ish  the  task.  Once  an  Individual  selects  a  given  element  to 
refine  further,  the  schema  Is  assumed  to  execute  to  completion,  developing  a 
solution  model  for  that  element  and  refining  It  into  a  more  detailed  plan.  If 
any  of  the  elements  resulting  from  this  process  are  complex  (  i.e.,  accomplish 
multiple  functions  that  are  not  recognized  as  having  known  solutions),  the 
schema  Is  called  recursively  to  reduce  them  to  the  next  level  of  detail. 

The  application  of  the  schema  to  an  element  of  a  design  causes  a  set  of 
high  level  goals  and  procedures  for  accomplishing  those  goals  to  be  activated. 
Thus,  the  schema  Includes  procedures  which  examine  Information  relevant  to  the 
expansion  of  a  given  element,  critique  potential  solutions,  generate 
alternative  solutions  for  a  subproblem,  etc.  The  Input  component,  for 
example,  finds  Information  that  must  be  passed  to  a  process  component  before 
the  actual  processing  can  be  Initiated.  If  the  chosen  Input  data  structure  Is 
complex,  that  Is,  requires  seme  degree  of  processing  itself  to  generate  the 
appropriate  data  structure,  then  a  new  subproblem  Is  generated  as  a  descendant 
of  the  original  one. 

The  design  schema  represents  the  global  organization  of  a  designer's 
professional  knowledge.  As  such,  it  will  Impact  almost  every  facet  of  the 
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designer's  behavior  In  the  domain.  Nevertheless,  the  design  schema  does  not 
encompass  a  person's  knowledge  of  specific  facts  In  computer  science  or 
understanding  of  how  things  function  in  the  real  world.  There  are  undoubtedly 
other  aspects  of  this  domain  which  should  not  be  subsumed  under  the  schema, 
but  our  theory  Is  not  sufficiently  developed  to  Isolate  them  at  this  point. 

The  decomposition  process  uses  two  additional  problem  solving  strategies. 
The  first  can  be  described  as  problem  solving  by  analogy,  or,  to  use  Sussman's 
(1977)  term,  "the  debugging  of  almost  right  plans".  When  the  solution  model 
generated  for  a  given  subproblem,  or  some  part  of  It,  Is  recognized  as  being 
analogous  to  an  already  understood  algorithm,  that  algorithm  Is  evaluated  for 
applicability  In  the  current  context.  If  It  is  found  to  be  reasonably 
applicable.  It  is  debugged  and  incorporated  Into  the  developing  solution. 

This  attempt  to  retrieve  previous  solutions  is  Invoked  once  a  solution  model 
has  been  derived,  but  before  any  further  refinement  takes  place. 

The  second  method  can  be  characterized  as  problem  solving  by 
understanding.  This  Is  prominent  in  cases  where  an  element  Identified  by 
application  of  the  design  schema  Is  not  understood  In  enough  detail  for  the 
design  schema  to  be  applied  to  it.  The  designer's  knowledge  of  the  problem 
area  In  question,  as  wel I  as  of  computer  science.  Is  then  used  to  refine  the 
understanding  of  this  element.  This  method  may  be  employed  at  any  point  in  the 
solution.  It  Is  most  frequently  applied  when  developing  a  solution  model,  but 
can  also  be  applied  during  refinement  of  a  subproblem. 

In  addition  to  controlling  the  overal I  problem  solving  process,  the 
design  schema  has  some  coordination  and  storage  functions.  Successful  solution 
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of  a  design  problem  requires  that  Information  generated  during  each  problem 
solving  episode  be  stored  In  long-term  memory.  This  Information  must  be 
Interconnected  with  the  expert's  knowledge  about  computer  science  as  well  as 
with  the  developing  solution.  Much  of  what  goes  on  can  be  described  as  the 
development  of  an  understanding  of  the  problem.  The  Information  generated 
during  these  understanding  phases  must  be  stored  such  that  It  can  be  retrieved 
later  for  the  solution  of  other  subproblems.  The  design  schema  ensures  that 
successive  episodes  are  organized  so  Information  generated  can  be  stored  In  a 
coherent  representation  of  the  developing  solution. 

The  utilization  of  memory  Is  Influenced  by  Its  organization  and  by  the 
effectiveness  of  the  abstract  cues  provided  by  the  schema.  Experience  enables 
concepts  to  be  linked  on  the  basis  of  the  utility  of  considering  the  concepts 
together.  This  usefulness  can  be  defined  In  terms  of  concepts  which 
frequently  occur  In  the  same  context  (e.g.,  linked  lists  and  efficient 
Insertion  and  deletion  of  Items  at  random  places  within  the  list)  or  which  are 
alternative  solution  techniques  To  similar  problems  (e.g.,  c-  symbol  table  may 
be  represented  as  a  hash  table  or  as  a  static  tree  table). 

When  a  computer  science  concept  Is  learned,  that  concept  Is  associated 
with  the  context  In  which  It  is  learned.  For  example,  one  might  first  learn 
about  a  particular  data  structure  In  the  context  of  a  certain  problem.  Later, 
In  another  problem  which  would  be  appropriate  for  this  type  of  data  structure, 
one  might  fall  to  apply  this  new  concept,  since  the  current  context  might  not 
encourage  Its  retrieval.  Eventually,  through  experience  with  the  concept  In 
many  other  contexts.  It  becomes  I  Inked  to  more  abstract  conditions  for  Its 
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use.  Further,  as  a  person’s  design  schema  develops  such  that  It  can  manage 
the  complexity  of  alternate  solutions,  this  concept  would  become  connected  to 
the  concepts  of  other  data  structures  which  would  be  considered  In  similar 
contexts.  Thus  memory  organization  Is  altered,  reflecting  the  designer's 
developing  schema  and  previous  experiences. 

The  major  control  processes  of  the  design  schema  are  summarized  as  a  set 
of  very  abstract  production  rules  in  Figure  1.  Each  rule  encapsulates  a 
complex  subprocess  that  an  expert  may  use  while  generating  a  software  design. 
The  rules  are  an  attempt  to  capture  the  global  control  processes  only;  many 
aspects  of  the  design  schema  are  not  addressed  at  al I.  In  particular,  no 
reference  Is  made  to  the  processes  that  generate  alternative  solutions  or 
critique  designs,  or  to  the  memory  coordination  functions  that  the  schema 
performs.  Moreover,  the  rules  only  refer  to  the  generation  of  a  design;  they 
do  not  encompass  Its  comprehension. 


Insert  Figure  1  about  here 


The  goal  of  software  design  is  to  break  down  a  problem  Into  a  set  of 
subprocesses  which  accomplish  the  task.  After  the  Initial  decomposition, 
there  may  be  multiple  subproblems  to  be  solved.  The  designer  must  have  a  way 
of  selecting  a  problem  to  work  on  from  the  currently  unsolved  subproblems. 

The  selection  rule  (Rule  1)  provides  a  coherent  way  of  determining  what 
problem  to  tackle  next.  The  rules  assume  that  the  list  of  unsolved 
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subproblems  are  stored  on  an  agenda.  The  selection  rule  results  in  one  of  them 
being  marked  as  a  current  subproblem.  The  other  rules  are  applied  to  this 
prob I em. 

The  usual  order  In  which  a  designer  attempts  subproblem  solution  Is  top- 
down,  breadth-first.  The  design  schema  causes  each  element  of  the  current 
iteration  to  be  expanded  to  the  next  level  of  detail.  This  expansion  continues 
until  a  new  representation  of  the  complete  solution  Is  developed  at  the  next 
level  of  detail.  Solving  the  problem  top-down,  breadth-first  ensures  that  all 
of  the  Information  about  the  current  state  of  the  design  at  one  level  of 
abstraction  will  be  available  to  the  next  iteration. 

One  reason  for  this  strategy  is  that  the  elements  of  a  developing  design 
can  Interact  with  each  other.  Although  one  of  the  heuristics  that  guides  the 
decomposition  process  is  the  attempt  to  define  subproblems  that  do  not 
interact  or  Interact  only  weakly,  this  is  not  always  possible.  Further 
refinement  of  one  element  may  require  knowledge  of  decisions  that  will  be  made 
In  developing  a  not-yet-cons! dered  element. 

A  designer  may  choose  to  deviate  frcm  this  order.  These  deviations  are 
dictated  by  individual  differences  in  design  style,  in  the  amount  of  knowledge 
that  the  designer  may  have  concerning  the  problem,  or  in  differences  in  the 
solution  model.  The  solution  model  with  its  various  constituents  may  enable  a 
designer  to  recognize  that  a  solution  relevant  to  the  current  problem  Is 
known.  This  solution  then  can  be  adapted  to  the  current  situation.  Also,  the 
representation  of  each  element  of  the  solution  model  may  enable  a  designer  to 
estimate  their  relative  difficulties  or  to  identify  potential  Interactions 
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which  Impact  further  development  of  the  design.  The  realization  that  one  or 
more  constituents  have  known  solutions,  are  critical  for  success,  present 
special  difficulties,  etc.,  can  cause  the  designer  to  deviate  from  a  top-down, 
breadth-first  expansion  of  the  overall  design  by  assigning  a  higher  priority 
to  a  particular  constituent. 

Once  a  subproblem  has  been  selected,  the  designer  attempts  to  derive  a 
solution  model  for  It  (Rule  2).  Recall  that  the  solution  model  is  an  abstract 
simplified  description  of  elements  of  the  subproblem’s  solution.  This 
solution  model  Is  the  basis  for  all  succeeding  work  on  this  problem.  Rules  2 
through  5  describe  the  processes  that  may  result  tn  the  generation  of  the 
solution  model  for  the  current  subproblem.  If  the  current  subproblem  Is 
perceived  to  be  complex,  the  designer  must  first  undertake  to  reformulate  It 
before  a  solution  model  can  be  generated.  Rule  3  represents  the  process  by 
which  Information  relevant  to  the  subproblem  Is  considered,  and  a  new  more 
understandable  problem  Is  produced.  Once  It  is  precisely  formulated,  a 
solution  model  Is  generated  If  the  problem  requires  further  decomposition 
(Rule  5).  If  the  problem,  once  understood.  Is  sufficiently  simple.  It  is 
marked  as  solved  and  is  not  further  considered  (Rule  4). 

The  next  set  of  rules  (Rules  6  through  10)  encompass  the  processes  by 
which  a  designer  attempts  to  retrieve  from  memory  a  previously  constructed 
solution  to  all  or  part  of  the  current  subproblem.  First,  the  solution  mode! 
for  this  problem  Is  used  as  a  retrieval  cue  to  access  potential  solutions  In 
memory  (Rule  6).  These  solutions  are  then  evaluated  for  their  usefulness  In 
the  current  context  (Rule  7).  The  rules  give  a  simplified  characterization  of 
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the  results  of  this  evaluation  process.  The  solution  is  eirher  accepted  as  is 
(Rule  8),  modified  to  fit  the  current  situation  (Rule  9),  or  rejected  (Rule 
10). 

If  no  usable  solution  to  the  current  subproblem  Is  found,  the  solution 
model  Is  refined  into  a  collection  of  well-defined  subproblems  (Rule  11). 

This  refinement  process  takes  into  account  data  flow;  functional  analysis; 
aesthetic,  practical,  and  other  criteria;  and  implementation  considerations. 
Each  new  subproblem  thus  generated  is  added  to  the  agenda.  The  set  of  rules 
Is  applied  to  the  subproblems  on  the  agenda  until  ail  problems  are  considered 
to  be  solved. 

The  theory  just  presented  describes  a  mechanism  by  which  experts  are  able 
to  Integrate  and  structure  their  high  level  knowledge  of  software  design. 

While  experts  In  the  field  should  manifest  mature  design  schemata,  we  would 
not  expect  beginning  designers  to  show  evidence  In  their  behavior  of  this 
complex  organization.  Therefore,  many  differences  we  might  observe  between 
experts  and  novices  can  be  attributed  to  differences  In  the  state  of 
development  of  their  design  schemata. 

A  COMPARISON  OF  EXPERT  AND  NOVICE  DESIGN  PROCESSES 

The  processes  Involved  In  designing  software  are  learned  through 
experience.  To  examine  their  development,  we  collected  thinking  aloud 
protocols  from  people  at  various  skill  levels.  This  set  of  protocols  forms  a 
rich  data  base  of  evidence  about  the  problem  solving  processes  used  in 
software  design.  There  are,  of  course,  many  similarities  In  the  way  experts 
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and  novices  approach  this  process;  subjects  at  different  levels  used  many  of 
the  same  global  processes.  Differences  as  a  function  of  expertise  fall  into 
two  major  categories:  the  processes  used  to  decompose  the  problem  and  solve 
individual  subproblems,  and  the  representation  and  utilization  of  relevant 
knowledge.  In  this  section,  the  similarities  and  differences  among  subjects 
will  be  discussed  and  related  to  the  theoretical  Ideas  proposed  above. 

Subjects  and  Materials 

Four  of  the  subjects  were  experienced  designers.  They  include  a 
professor  of  electrical  engineering  (S35),  two  graduate  students  in  computer 
science  (S2  and  S5),  both  of  whom  had  worked  as  programmers  and  designers  for 
several  years,  and  a  professional  systems  analyst  with  over  ten  years 
experience  (S3). 

The  five  novices  were  undergraduate  students  recruited  from  an  assembly 
language  programming  class.  They  had  all  taken  from  four  to  eight  computer 
science  courses;  most  had  had  part-time  programming  jobs.  While  these 
subjects  are  moderately  experienced  programmers,  they  have  little  experience 
with  software  design  per  se.  We  selected  two  subjects  from  this  group  (S17 
and  S19)  and  examined  their  thinking  aloud  protocols  in  detail.  Both  these 
subjects  had  taken  a  course  that  specif  leal ly  taught  software  design. 

We  also  collected  a  protocol  from  a  subject  with  no  software  design 
experience  (S25,  whom  we  will  call  a  pre-novice).  This  subject  has  taken 
several  programming  courses  and  has  written  programs  In  the  course  of  the 
research  in  which  she  is  involved.  Her  experience  differs  from  the  novices  In 
two  ways:  her  formal  training  has  dealt  solely  with  the  practical  aspects  of 
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programming,  and  therefore  she  has  tittle  knowledge  of  the  theoretical 
constructs  of  computer  science;  and,  all  of  her  programming  experience  has 
been  statistical  programming  In  FORTRAN. 


Insert  Figure  2  about  here 


The  particular  problem  given  to  the  subjects  is  to  design  a  page-keyed 
indexing  system.  The  problem  specifications  are  shown  In  Figure  2.  This 
problem  was  chosen  because  It  is  of  moderate  difficulty  and  understandable  to 
individuals  with  a  wide  range  of  knowledge  of  software  design,  while  not 
requiring  knowledge  of  highly  special Ized  techniques  that  would  be  outside  the 
competence  of  some  expert  subjects.  That  Is,  a  reasonable  design  could  be 
constructed  for  this  task  using  only  the  techniques  taught  In  upper-division 
undergraduate  courses  In  computer  science  or  those  contained  In  standard 
textbooks  on  computer  science  algorithms.  A  variety  of  approaches,  however, 
could  be  taken  to  design  such  a  system. 

The  protocols  of  a  subset  of  the  subjects  were  analyzed  In  detail,  while 
others  were  examined  more  cursorily  to  find  corroborating  evidence.  The 
method  by  which  this  analysis  was  carried  out  and  the  results  obtained  can  be 
found  In  Atwood  and  Jeffries  (1980).  The  discussion  below  is  based  primarily 
on  the  detailed  analysis,  but  examples  have  been  chosen  freely  from  all  the 
protocols. 
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Similarities  Across  Expertise  Levels 

On  a  first  reading  of  these  protocols,  one  Is  struck  by  the  variations  In 
the  design  solutions  as  much  within  expertise  levels  as  across  them.  Both  the 
design  style  of  the  Individual  subject  and  the  set  of  subproblems  he  or  she 
chose  to  attack  make  each  solution  very  different  from  any  of  the  others. 

More  careful  consideration,  however,  brings  up  many  similarities,  both  within 
experience  groups  and  across  al I  the  subjects. 

Almost  all  the  subjects  approached  the  problem  with  the  same  global 
control  strategy:  decompose  the  problem  Into  subproblems.  They  began  with  an 
Initial  sketchy  parse  of  the  problem,  which  we  have  called  the  solution  model. 
Seme  subjects  were  quite  explicit  about  their  solution  models,  while  for 
others  It  was  necessary  to  Infer  the  underlying  model.  Whenever  a  subject 
made  a  quick,  smooth  transition  from  one  element  of  the  solution  to  the  next, 
without  any  overt  consl deration  of  alternatives,  and  without  reference  to 
external  memory,  we  assumed  that  the  solution  model  underlay  this  decision. 

The  solution  models  for  the  Indexer  problem  are  surprisingly  similar  for 
both  experts  and  novices.  In  general,  subjects  decided  to  read  In  the  terms, 
build  some  sort  of  data  structure  to  contain  them,  compare  the  terms  to  the 
text,  associate  the  page  numbers  with  each  term,  and  output  the  terms  and  page 
numbers.  We  do  not  assume  that  this  would  be  true  for  all  software  design 
problems.  The  Indexer  problem  was  chosen  to  be  ’’straightforward";  for  such  a 
problem,  expertise  Is  needed  not  for  the  Initial  solution  model,  but  for  the 
expansion  of  this  model  Into  a  well-defined  set  of  subproblems  and  the  further 
refinement  of  those  subproblems.  Our  results  are  therefore  potentially  limited 
to  similar  straightforward  problems.  In  tasks  for  which  the  determination  of 
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a  solution  model  Is  Itself  a  difficult  task,  quite  different  problem  solving 
methods  may  be  used.  Once  the  Initial  solution  model  was  derived,  the  subjects 
attempted  to  expand  this  Iteratively.  No  subject  went  directly  from  the 
solution  model  to  a  complete  solution.  They  broke  the  problem  Into 
subproblems,  and  refined  the  solution  through  several  levels. 

As  a  group,  the  novices  explored  a  set  of  subproblems  similar  to  those 
examined  by  the  experts.  The  Initial  decomposition  led  to  equivalent 
constituents,  and  In  further  Iterations  the  novices  as  a  group  developed 
subproblems  that  were  still  comparable  to  the  experts.  The  experts  tended  to 
examine  more  subproblems  and  frequently  found  different  solutions.  Even  for 
Idiosyncratic  aspects  of  the  problem,  however  (e.g.,  how  to  treat  hyphenated 
words,  terms  that  cross  page  boundaries),  the  novices  were  as  likely  as  the 
experts  to  Incorporate  a  particular  element  Into  the  solution. 

While  the  novices  applied  the  same  general  problem  solving  methods  as  did 
the  experts,  their  solutions  were  neither  as  correct  nor  as  complete. 
Furthermore,  the  novices  were  not  able  to  apply  the  more  efficient  problem 
solving  processes  that  the  experts  used.  The  novices  were  lacking  In  skills 
In  two  areas:  processes  for  solving  subproblems,  and  ways  of  representing 
knowledge  effectively. 

Subproblem  Solution  Processes 

Decomposition.  When  these  subjects,  both  experts  and  novices,  perceived 
a  particular  problem  to  be  complex,  they  decomposed  It  Into  a  collection  of 
more  manageable  subproblems.  The  experts,  of  course,  were  more  effective  than 
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the  novices  at  doing  this.  They  showed  some  stylistic  differences  in  when  and 
how  they  used  the  decomposition  process,  but  Its  use  Is  pervasive  in  all  four 
expert  protocols. 

S2’s  protocol  Is  an  almost  perfect  example  of  solution  by  repeated 
decomposition.  He  Is  a  proponent  of  design  by  stepwise  refinement;  In  this 
protocol  he  rigidly  adheres  to  such  a  strategy.  His  Initial  decomposition  Is  a 
listing  of  the  major  steps  to  be  accomplished,  little  more  than  a  precise 
reformulation  of  his  solution  model.  On  the  next  iteration  he  adds  a  control 
structure  to  this  collection  of  modules.  Successive  passes  decompose  these 
modules  Into  sets  of  submodules  until  he  Is  satisfied  that  he  has  reached  the 
level  of  primitive  operations. 

S3  also  Iteratively  decomposes  the  problem  In  a  top-down,  breadth- first, 
beginning  to  end  manner.  Her  style,  and  the  design  she  eventually  produces. 

Is  similar  to  that  of  S2,  except  that  her  protocol  Is  Interspersed  with 
digressions  that  relate  to  subproblems  at  other  levels  and  at  other  positions 
In  the  problem.  S3  also  attempts  fewer  Iterations  than  S2,  bringing  the 
problem  to  a  slightly  higher  level  of  detail  In  two  passes  as  S2  did  In  five 
or  six.  In  fact,  at  the  end  of  the  protocol,  she  real Izes  that  the  second 
Iteration  Is  so  much  more  detailed  than  the  first  that  It  taxes  her  ability  to 
comprehend  the  solution.  She  then  Incorporates  a  sketchy  third  Iteration  at  a 
"higher"  level  than  the  previous  one. 

After  articulating  hfs  problem  model,  S5  notes  tfiat  In  order  to  know  how 
to  read  the  term  file  Into  a  data  structure,  he  needs  to  know  more  about  how 
the  matcher  works.  He  then  proceeds  to  work  out  the  design  of  the  matcher  and 
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Its  associated  data  structures.  This  places  him  directly  In  the  middle  of  the 
decomposition  tree,  working  simultaneously  on  two  distinct  branches.  After 
ascertaining  how  the  match  process  would  operate,  he  proceeds  to  flesh  out  the 
design,  proceeding  from  here  In  a  top-down,  breadth-first,  beginning  to  end 
manner. 

The  core  of  S35*s  solution  Is  an  algorithm  he  retrieves  that  defines  the 
term  data  structure  and  the  matcher.  Using  this  as  a  base,  he  builds  the 
design  in  a  top-down,  breadth- f I rst  manner,  although  he  does  not  expand  It 
beginning  to  end.  The  reason  for  this  Is  that  he  defines  the  problem  In  terms 
of  data  structures  derived  fran  his  original  functional  analysis  of  the 
problem  decomposition.  Occasional  deviations  from  this  breadth-first  order 
occur  when  he  attempts  to  define  low  level  primitive  actions  which  are  the 
building  blocks  of  his  design. 

All  of  these  experts  demonstrate  the  existence  of  a  pol Ished  design 
schema  and  a  sophisticated  ability  to  use  the  decomposition  method  to  expand 
their  designs.  Differences  across  experts  were  In  part  dictated  by  disparate 
design  styles,  but  to  a  great  extent  were  due  to  differences  In  their 
knowledge  of  and  ability  to  retrieve  a  relevant  solution  plan. 

The  novices,  on  the  other  hand,  were  much  less  effective  In  their  use  of 
the  Iterative  decomposition  method.  They  seem  to  lack  the  more  subtle  aspects 
of  the  design  schema.  A  wel l-developed  schema  should  guide  the  designer  toward 
the  production  of  a  "good”  design,  as  opposed  to  one  that  accomplishes  the 
task  "by  hook  or  by  crook."  This  means  that  considerations  of  efficiency, 
aesthetics,  etc.,  should  Influence  the  manner  In  which  design  elements  are 
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expanded.  There  Is  no  evidence  of  this  In  the  novices.  Furthermore,  the  schema 
should  Include  procedures  that  enable  designers  to  make  resource  decisions 
about  the  order  In  which  to  expand  the  modules,  e.g.,  most  difficult  first,  or 
a  module  that  uses  a  data  structure  might  be  designed  before  the  one  that 
produces  It.  In  the  novices  that  we  have  examined  In  detail,  we  see  no 
deviations  from  the  default  breadth-first,  beginning  to  end  cons  I  derat loti  of 
modu I es . 

The  best  of  the  novices  was  SI 9.  He  Is  the  only  novice  that  Iterates  the 
problem  through  more  than  two  levels  of  decomposition.  However,  beyond  the 
first  level,  he  Is  unable  to  recursively  apply  some  of  the  same  decomposition 
strategies  he  used  earlier.  S19  gets  particularly  bogged  down  In  his 
"compare"  routine,  rewriting  It  several  times  without  complete  success.  On 
each  attempt  he  simply  tries  to  generate  a  solution  through  brute  force  by 
writing  down  the  necessary  steps.  There  is  no  hint  of  having  generated  a 
model  for  this  process  nor  of  any  attempt  to  further  decompose  It. 

SI 7  was  able  to  decompose  the  Indexer  problem  and  to  generate  an  adequate 
Initial  pass  at  a  solution.  He  then  attempted  fo  expand  his  solution  (mostly 
at  the  urging  of  the  experimenter).  However,  he  makes  no  attempt  to  further 
decompose  his  chosen  modules.  Each  subsequent  iteration  simply  repeats  the 
previous  solution,  adding  on  new  "facts"  as  he  discovers  them.  For  example, 
at  one  point  he  considers  the  possibility  thal  a  term  straddles  pages.  He 
changes  his  design  to  accommodate  this,  but  he  does  so  by  augmenting  existing 
elements,  not  by  decomposing  them  Into  submodules.  This  sort  of  behavior 
Indicates  that  S17  Is  unable  to  recursively  apply  the  design  schema. 
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Another  of  the  novices  writes  down  a  solution  In  terms  of  steps.  Instead 
of  modules.  The  distinction  between  steps  and  modules  Is  necessarily  a  fuzzy 
one.  However,  a  set  of  steps  differs  from  a  modular  decomposition  In  that 
steps  have  no  hierarchical  structure,  steps  of  very  different  levels  of  detail 
may  occur  together,  and  steps  have  only  a  primitive  control  structure.  In  the 
second  Iteration  of  his  design,  this  novice  merely  produces  a  similar  set  of 
steps,  more  specif  leal ly  tied  to  the  architecture  of  a  particular  computer. 

He  appears  to  understand  that  a  problem  should  be  broken  down,  but  has  not 
developed  a  design  approach  which  decomposes  Into  subproblems. 

Although  the  novices  have  not  Incorporated  the  more  subtle  aspects  of  the 
design  schema  into  their  behavior,  they  can  apply  the  basic  principles.  The 
pre-novice,  S25,  however,  has  not  developed  even  a  rudimentary  design  schema. 
First,  S25's  protocol  Is  qualitatively  different  from  those  of  the  computer 
science  majors.  They  produced  designs  which,  while  differing  In  many  details 
from  those  of  the  experts,  were  at  least  Marginally  acceptable  solutions  to 
the  problem.  S25  did  not  produce  a  design.  She  generated  a  mixture  of  FORTRAN 
code  and  comments  that  together  could  be  taken  as  a  partial  solution  to  the 
task  of  writing  a  program  to  solve  the  Indexer  problem.  Moreover,  she  got 
quite  bogged  down  In  the  selection  of  data  structures  for  the  text  and  terms 
and  In  the  Implementation  of  procedures  to  compare  Items  In  these  structures. 
Because  of  these  difficulties,  she  eventually  abandoned  the  task  without 
generating  a  complete  solution. 

S25  made  no  attempt  to  decompose  the  problem;  she  did  not  seem  to  be 
using  any  kind  of  an  overall  mode!  to  guide  her  solution.  She  let  the  problem 
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description  and  the  portion  of  the  "program"  already  written  direct  her 
expansion  of  a  solution.  Information  did  not  seem  to  accumulate  over  the 
solution  attempt;  she  attacked  the  same  stbproblem  repeatedly,  but  often  made 
no  progress  beyond  the  Initial  attempt.  She  did  seem  to  understand  that  Input, 
process,  and  output  components  were  needed,  but  this  was  not  sufficient  to 
produce  a  correct  Initial  decomposition  of  the  problem. 

We  take  this  continuum  of  more  effective  use  of  the  decomposition  method 
with  Increasing  experience  as  strong  evidence  for  both  the  reality  and  the 
usefulness  of  the  design  schema.  Another  aspeci  of  expertise  that  Is  apparent 
In  these  protocols  Is  the  ability  of  the  experts  to  generate  and  evaluate 
alternative  solutions  to  a  subproblem. 

Evaluation  of  Alternatives.  When  the  experts  are  trying  to  determine 
whether  a  particular  plan  Is  actually  a  good  solution  to  a  subproblem,  they 
state  alternative  solutions  and  select  among  them.  S3,  for  example, 
explicitly  mentions  that  the  page  numbers  coulc  be  stored  In  an  array  or  a 
linked  list.  She  does  some  calculations  of  the  relative  storage  requirements 
of  each  and  chooses  the  linked  list  because  It  Is  more  efficient.  S35  spends 
some  time  considering  two  ways  of  Implementing  his  term  data  structure;  one  Is 
time  efficient,  and  the  other  Is  storage  efficient.  He  concludes  that, 
without  knowledge  of  the  actual  computer  system  to  be  used,  he  does  not  have 
enough  Information  to  decide  which  Is  better.  He  chooses  to  leave  both  as 
alternatives. 

The  novices  seldom  consider  more  than  one  possible  solution  to  any 
subproblem.  From  the  marginal  utility  of  seme  of  the  solutions  they  do 


retrieve.  It  seems  that  they  are  hard  pressed  to  find  even  one  solution  to 
many  subproblems.  For  example,  at  one  point,  SI 9  says  "This  might  be  the  only 
way  1  can  think  of  to  be  able  to  do  this.  It’s  going  to  be  awful  expensive," 
and  elsewhere,  "It’s  Inefficient  and  expensive,  but  It's  easy."  He  seems  to 
have  some  ability  to  critique  his  solutions,  but  Is  at  a  loss  to  correct  the 
deficiencies  he  finds. 

In  the  few  cases  In  which  the  novices  chocse  among  alternatives,  they 
make  simple  dichotomous  decisions  (do  X  or  not  X).  Their  decision  is 
invariably  made  on  the  basis  of  programming  convenience.  For  example,  S19 
notices  that  a  term  could  straddle  a  page.  He  spends  seme  time  deciding 
whether  or  not  to  permit  this,  and  decides  that  it  Is  easier  not  to  al low  It, 
although  this  solution  Is  unlikely  to  be  rea1 istlc  In  terms  of  Indexing  a 
textbook. 

Retrieval  of  Known  Solutions.  One  of  the  features  of  the  decomposition 
technique  Is  that  It  enables  the  designer  to  convert  a  problem  Into  a  set  of 
simpler  subproblems,  eventually  reaching  the  point  where  all  the  subproblems 
have  known  solutions.  While  the  novices  attempt  to  employ  decomposition,  we 
see  no  evidence  that  they  do  so  In  order  to  arrive  at  a  set  of  known 
solutions.  The  experts,  in  contrast,  seem  to  have  a  large  repertory  of 
solutions  and  of  methods  for  decomposing  a- problem.  The  clearest  examples  of 
this  are  when  some  of  the  expert  subjects  were  able  to  recal I  and  apply  a 
single  solution  to  the  major  problem  tasks.  $35  and  S5  both  attempted  this. 

S35,  after  reading  the  specifications.  Immediately  states  "well,  the 
obvious  answer  to  this  Is  to  use  the  technique  of  Aho  and  Coraslck,  which 


32 


appeared  In  CACM  "  (Aho  and  Coraslck,  1975).  This  article  describes  an 
algorithm  for  searching  text  for  embedded  strings.  He  says,  "basically  what 
you  do  Is  you  read  the  term  file,  and  you  create  a  finite  state  machine  from 
It.  And  then  you  apply  this  finite  state  machine  to  the  text."  S35  then  spends 
the  next  two  hours  expanding  this  solution  Into  a  complete  design. 
Incorporating  the  Idiosyncrasies  of  this  problem  (e.g.,  that  the  page  number 
Is  not  known  until  the  end  of  the  page)  Into  fhls  general  algorithm.  It  Is 
apparent  that  hls  understanding  of  the  algorithm  strongly  Influences  the 
expanding  design  and  many  of  the  design  decisions. 

After  hls  Initial  parsing  of  the  problem,  S5  notes  that  the  match  process 
Is  critical  for  an  efficient  and  successful  solution.  This  reminds  him  of  a 
published  algorithm  that  may  be  applicable  to  this  situation.  "Now  my 
Immediate  Inclination  Is  to,  about  three  CACMs  ago,  this  particular  problem 
was  discussed."  The  algorithm  (Boyer  and  Moore,  1977)  he  refers  to  Is  similar 
to  the  one  recal led  by  S35. 

S5's  memory  of  this  algorithm  Is  somewhaf  sketchy,  though,  and  he  Is 
unsure  of  how  It  Interacts  with  the  rest  of  the  design.  He  works  through  the 
match  process  and  Its  associated  data  structure  In  some  detail.  The  resulting 
algorithm  Is  similar  to,  but  not  identical  with,  the  published  algorithm.  In 
a  very  real  sense,  he  constructs  an  original  solution  that  Incorporates  many 
of  the  features  that  he  recalls  from  the  Boyer  and  Moore  algorithm.  From 
there,  he  proceeds  Iteratively  through  refinements  of  the  design  as  a  whole. 

Our  other  two  experts,  S2  and  S3,  did  nol  retrieve  a  single  solution  to 
the  major  tasks,  but  they  frequently  solved  subproblems  by  Incorporating  plans 
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that  they  had  used  before.  For  example,  S2  uses  a  linked  list  to  store  the 
page  numbers.  He  notes  that  the  Insertion  procedure  Is  somewhat  tricky  to 
Implement;  he  would  prefer  to  refer  to  one  of  his  earl  ter  programs,  rather 
than  spend  the  time  to  work  out  the  details  again.  S3,  when  considering  the 
problem  that  hyphens  can  serve  two  distinct  functions  In  the  text  (as  part  of 
a  word  or  to  divide  a  word  at  a  line  boundary),  mentions  that  she  knows  of  a 
similar  case  that  was  solved  by  requiring  that  distinct  characters  be  used  In 
each  case. 

The  experts  not  only  retrieve  solution  plans  to  a I  I  or  part  of  the 
problem,  but  they  are  able  to  modify  those  solutions  to  fit  the  current 
situation.  S35's  design  was  a  modification  of  a  wel 1  understood  plan.  S5 
only  retrieved  the  skeleton  of  a  plan;  he  spent  most  of  his  time  augmenting 
and  altering  this  plan  to  fit  the  actual  problem. 

The  novices  show  no  evidence  that  they  are  trying  to  adapt  previously 
learned  solutions  to  any  part  of  this  problem.  No  novice  ever  made  a 
statement  like  "this  is  Just  like  X"  or  "I  did  something  similar  when  Y."  They 
do  retrieve  solutions,  but  only  at  the  lowest  levels.  For  example,  S17 
decided  that  he  would  flag  the  first  empty  position  for  each  term  In  his  page 
number  array.  This  Is  a  solution  to  the  problem  of  locating  the  current  end  of 
the  page  number  1 1st,  but  It  Is  far  from  the  best  one.  SI 7  makes  no  attempt 
to  alter  this  solution  so  that  It  accomplishes  this  In  a  more  efficient 
manner.  It  Is  not  clear  whether  this  Is  due  to  his  Inability  to  real Ize  the 
Inefficiencies  In  this  solution,  or  whether  he  simply  does  not  know  what 
modifications  to  make. 
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Knowledge  Representation 


Access  to  Background  Knowledge.  The  experts  demonstrated  an  Impressive 
ability  to  retrieve  and  apply  relevant  Information  In  the  course  of  solving 
this  problem.  The  appropriate  facts  are  utilized  just  when  they  are  needed; 
Important  Items  are  seldom  forgotten.  Moreover,  they  devote  little  time  to 
the  consideration  of  extraneous  Information.  In  contrast,  the  novices*  lack 
of  an  adequate  knowledge  organization  for  solving  this  problem  Is  apparent 
throughout  their  protocols.  They  frequently  fall  to  correctly  apply  knowledge 
that  Is  needed  to  solve  the  problem,  and  the  Information  that  they  generate  In 
the  course  of  solving  the  problem  Is  often  not  avallablo  to  them  when  It  Is 
most  needed.  We  attribute  this.  In  part,  to  the  Inadequacy  of  the  organizing 
functions  provided  by  their  Immature  design  schemata. 

The  novices'  failure  to  apply  relevant  knowledge  can  be  seen  In  their 
selection  of  a  data  structure  for  the  terms  and  page  numbers,  tach  term  can 
potentially  have  a  very  large  number  of  page  references  associated  with  It, 
but  the  typical  entry  will  have  only  a  few  references.  The  selected  data 
structure  should  allow  for  the  occasional  term  with  an  extreme  number  of 
references  without  having  to  reserve  large  amounts  of  storage  for  every  term. 

A  linked  list  Is  a  data  structure  which  allows  these  properties.  Our  experts 
used  a  linked  list  to  hold  the  page  numbers  associated  with  each  term.  The 
course  from  which  the  novices  were  recruited  had  recently  covered  linked 
lists.  In  addition,  most.  If  not  all,  of  them  had  been  exposed  to  this 
concept  In  other  courses.  Thus,  we  are  confident  that  the  subjects  were 
familiar  with  the  construct.  In  spite  of  this,  none  used  such  a  list  to  hold 
the  page  numbers.  They  al  I  stored  page  numbers  In  an  Immense  array.  Several 
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subjects  mentioned  that  such  an  arrangement  was  Inefficient,  but  none  were  led 
to  change  It. 

The  construction  of  a  linked  list  Is  a  technique  with  which  these 
subjects  are  familiar.  However,  their  understanding  of  when  that  technique  Is 
applicable  does  not  extend  to  the  current  situation.  Understanding  of  the 
conditions  under  which  sane  piece  of  knowledge  Is  applicable  Is  one  way  In 
which  knowledge  about  a  domain  becomes  integrated.  For  this  Information  to  be 
useful.  It  cannot  exist  as  a  set  of  isolated  facts,  but  must  be  related  to 
other  knowledge.  For  example,  linked  lists  would  be  Interrelated  with 
Information  such  as  additional  types  of  data  structures  and  methods  of  gaining 
storage  efficiency  in  a  program.  The  experts  have  achieved  this  Integration 
of  concepts,  while  It  Is  still  undergoing  development  In  the  novices. 

Episodic  Petrleval.  The  design  schema  mediates  retrieval  of  information 
within  a  problem  solving  effort  as  well  as  retrieval  of  relevant  background 
knowledge.  The  experts,  with  their  more  mature  design  schemata,  were  better 
able  to  accumulate  useful  Information  during  the  course  of  the  solution 
attempt  and  to  apply  It  at  the  relevant  fime.  The  clearest  example  of  this  Is 
S3*s  handling  of  the  Issue  of  hyphens  In  the  problem. 

Early  In  the  protocol,  S3  notices  that  the  text  ma/  contain  hyphens  and 
that  this  canpllcates  the  comparison  process.  At  this  point,  S3  only  notes 
this  "as  being  a  problem  when  you  come  around  to  con  paring."  This  Issue  Is  not 
considered  for  long  portions  of  the  protocol,  but  It  emerges  whenever  a  module 
that  Is  related  to  the  compare  operation  or  accessing  the  text  Is  considered. 
S3  never  mentions  hyphens  when  she  Is  expanding  the  "read  terms"  module,  but 
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It  Is  one  of  the  first  things  mentioned  when  the  "construct  Index"  module  Is 
taken  up. 

In  contrast,  the  novices  are  not  only  less  able  to  generate  relevant 
Information,  but  the  Information  that  they  do  generate  Is  not  stored  In  an 
easily  retrievable  form.  SI 9  provides  an  Illustration.  Early  In  his  solution, 
he  notes  that  a  term  may  straddle  a  page.  He  decides  that  this  possibility 
complicates  the  design  unnecessarily  and  legislates  that  It  will  not  happen. 

He  even  writes  down  this  assumption.  Some-time  later  he  again  notices  that  this 
problem  could  occur.  He  treats  this  as  an  entirely  new  discovery;  no  mention 
Is  made  of  his  earlier  treatment  of  the  topic.  In  fact,  during  this  second 
episode,  he  decides  to  al low  terms  to  straddle  page  boundaries,  but  uses  the 
ending  page  number  Instead  of  the  starting  page  number  as  the  reference.  This 
too  Is  written  down,  but  neither  then  nor  later  does  he  notice  that  It 
contradicts  his  earl ler  assumption. 

Another  example  Is  that  S17  mistakenly  assumes  that  terms  will  be  single 
words,  rather  than  phrases.  In  the  middle  of  the  problem,  while  rereading  the 
specifications  for  seme  other  purpose,  he  notices  the  error  and  comments  on 
corrections  that  must  be  made  to  allow  for  multi-word  terms.  However,  none 
are  Incorporated  Into  his  next  Iteration  of  the  problem,  which  only  deals  with 
single  word  terms.  At  the  end  of  the  session,  he  notices  once  more  that  terms 
are  phrases  and  that  his  design  must  be  modified  to  account  for  that  fact. 

This  failure  to  recall  Information  over  the  course  of  a  single  solution 
attempt  Is  probably  the  result  of  two  handicaps  under  which  the  novices  must 
operate.  First,  the  solution  to  these  problems  consumes  such  a  large  portion 
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of  their  resources  that  they  are  unable  to  monitor  memory  for  other 
potentially  relevant  Information.  Experts  can  avoid  overloading  themselves  by 
utilization  of  the  design  schema.  Second,  their  memory  representation  of  the 
problem  Is  not  organized  In  such  a  way  as  to  facilitate  the  retrieval  of 
previously  generated  information. 

Understanding  of  Concepts.  The  novices  fall  to  have  an  adequate 
understanding  of  many  of  the  basic  concepts  of  computer  science.  These 
undergraduates  are  generally  familiar  with  only  one  machine  (the  CDC6400)  and 
two  or  three  programming  languages.  Much  of  their  understanding  of  the  basic 
concepts  Is  tied  to  their  experience  with  one  or  two  exemplars  of  that  concept 
and  reflects  the  Idiosyncrasies  of  that  experience.  These  mistaken  assumptions 
frequently  lead  to  Inefficient  designs  and  occasionally  to  outright  errors. 

Several  examples  of  incomplete  or  Incorrect  understanding  of  concepts  can 
be  found  In  the  protocols  of  S17  and  SI 9.  S17,  In  particular,  repeatedly 

attempts  to  Incorporate  constructs  Into  his  design  that  he  Is  aware  of,  but 
does  not  fully  understand.  He  tells  the  experimenter  that  the  book  text 
should  be  stored  as  a  "binary  tree";  l.e.,  he  Intends  to  read  In  the  book  text 
and  sort  It  Into  alphabetic  order  (presumably  by  word).  A  binary  tree  Is  an 
efficient  structure  for  repeatedly  searching  ordered  collections  of  Items.  It 
allows  one  to  find  an  arbitrary  Item  In  the  set  with  substantially  less 
searching  than  a  sequential  search  requires.  In  much  the  same  way  that  one 
looks  up  an  entry  In  a  dictionary  or  a  phone  book.  However,  all  the 
Information  as  to  which  word  follows  another,  which  are  necessary  to  Isolate 
phrases  from  the  text.  Is  lost.  S17  has  apparently  learned  some  of  the 
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conditions  under  which  a  binary  tree  should  be  used,  but  he  clearly  does  not 
understand  the  concept  well  enough  to  reject  It  In  this  obviously  unsuitable 
situation. 

Contrast  this  with  the  solution  of  M5.  He  Is  quite  concerned  with 
efficient  storage  of  the  terms  and  the  text.  He  spends  over  an  hour  working 
out  appropriate  data  structures  and  how  They  will  be  searched,  as  opposed  to 
the  minute  or  two  spent  by  S17.  S5's  solution  Is  to  store  the  text  as  a 
string,  and,  for  very  much  the  reasons  mentioned  above,  to  store  the  terms  In 
a  binary  tree.  These  decisions  are  exactly  opposite  to  those  arrived  at  by 
S17 . 

S19's  protocol  shows  that  he  does  nol  completely  understand  the 
difference  between  computer  words  and  English  words.  On  the  computer  that  he 
Is  familiar  with,  a  computer  word  will  contain  an  English  word  of  up  to  ten 
characters,  so  for  many  practical  purposes,  the  distinction  Is  not  needed.  In 
his  term  data  structure,  he  allocates  five  (computer)  words  for  each  term,  one 
for  each  (English)  word.  While  this  might  not  be  the  most  efficient  way  to 
store  the  terms.  It  might  work  for  some  data  sets,  at  least  on  the  CDC6400. 

His  misunderstanding  of  the  difference  gets  him  Into  trouble,  however,  when  he 
tries  to  read  the  text.  He  Initial ly  tries  to  read  It  a  line  at  a  time,  but 
abandons  this  because  he  cannot  determine  how  many  words  are  on  a  line.  He 
then  decides  to  read  the  text  a  word  at  a  time.  His  assumption  that  an  English 
word  !s  a  natural  unit  for  Input  (It  is  not;  It  takes  a  substantial  amount  of 
computation  to  determine  the  word's  boundaries)  Is  due  to  his  confusion 
between  the  two  types  of  words. 
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S3,  on  the  other  hand,  not  only  understands  the  difference  between  the 
two  concepts,  but  Is  also  aware  that  the  overlapping  terminology  Is  confusing. 
When  she  Is  allocating  list  pointers,  she  comments  "the  pointers  themselves 
are  actually  In  a  vector  of  NT  units,  or  words,  well,  computer  words,  I  guess, 
...  (that's  certainly  a  misused  word)."  Thus,  she  Is  sensitive  to  the 
distinction  between  the  concepts  as  well  as  the  confusibl I Ity  In  terminology. 

Yet  another  example  Is  SI  7  *  s  confusion  over  what  a  flag  Is  and  when  to 
use  one.  A  flag  Is  a  variable  that  can  take  on  two  values,  usually  "true"  and 
"false".  It  Is  used  to  Indicate  the  staius  of  some  condition  that  changes 
within  the  program.  SI  7  has  some  understanding  of  the  use  of  flags,  as  he 
Intends  to  "set  a  flag  back  and  forth"  to  signal  the  end  of  the  text  file. 
While  this  Is  not  an  error.  It  Is  not  a  particularly  good  use  for  a  flag,  as 
the  end  of  the  text  file  will  only  be  reached  once,  and  a  simple  test  for  the 
condition  would  be  more  suitable. 

Later  on  In  the  design  he  needs  a  way  to  Indicate  which  terms  have  been 
found  on  the  current  page  before  the  page  number  Is  available.  While  his 
solution  Incorporates  the  Idea  of  setting  a  flag,  he  calls  It  a  "count".  This 
misuse  of  terminology  confuses  him  later  on,  when  he  mistakes  this  "count"  for 
a  count  of  the  number  of  times  each  term  occurs  In  the  entire  text. 

Understanding  of  Implications.  In  addition  to  their  conceptual  failures, 
the  novices  are  often  unable  to  extract  all  the  Implications  of  a  piece  of 
knowledge.  In  particular,  they  are  frequently  unable  to  derive  the 
Implications  of  the  interactions  between  a  task  and  a  computer  Implementation 
of  that  task.  This  Is  exemplified  by  the  difference  In  the  way  the  experts 
and  novices  dealt  with  the  subproblem  which  compares  the  text  and  terms. 
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This  subproblem  is  the  heart  of  this  design,  since  the  efficiency  of  this 
routine  directly  Impacts  the  overall  efficiency  of  the  program.  All  of  the 
experts  treated  the  matcher  as  a  difficult  problem.  They  concerned  themselves 
with  many  aspects  of  it:  whether  comparing  shoUd  be  done  character  by 
character  or  word  by  word,  how  to  organize  the  data  to  minimize  the  number  of 
comparisons  that  are  unsuccessful,  what  constitutes  a  correct  match.  The 
novices,  for  the  most  part,  simply  stated  the  subproblem  and  made  no  further 
effort  to  decompose  it.  They  seemed  to  treat  it  as  too  simple  to  require 
further  consideration.  The  experience  most  of  the  novices  have  with  compare 
procedures  Is  with  those  that  deal  with  comparing  numbers.  For  such  cases,  the 
procedure  is  quite  straightforward.  Questions  about  how  much  to  compare  at 
once  and  how  to  decide  If  a  match  has  occurred  never  arise.  The  novices  did 
not  retrieve  Information  about  factors  that  must  be  considered  in  a  character 
string  compare,  because  they  simply  did  not  understand  the  Implications  of  the 
way  a  computer  compares  data. 

The  novice  protocols  indicate  that  novices  have  mastered  the  Jargon  of 
the  field;  their  comments  are  peppered  wiih  technical  computer  science  terms. 
More  careful  examination,  however,  shows  that  these  terms  do  not  have  the  same 
meaning  for  the  novices  as  they  do  for  the  expert.  This  implies  that  In  some 
sense  design  decisions  that  are  described  by  Ihe  same  words  are  not  the  "same" 
for  people  of  different  experience  levels.  In  addition,  as  the  above  examples 
show,  these  misunderstandings  and  failure1,  to  deduce  relevant  Implications 
frequently  lead  the  novices  astray.  They  confuse  similar  concepts  or  apply  a 
concept  when  It  Is  Inappropriate  or  do  not  take  Into  account  pertinent 


41 


considerations.  In  actual  designs  these  subtle  errors  could  be  disastrous,  as 
they  probably  would  not  be  noticed  until  the  program  was  written.  If  the 
problem  were  serious  enough  that  a  major  change  to  the  design  was  required, 
large  amounts  of  effort  would  have  been  wasted. 

DISCUSSION 

The  decomposition  process  Is  central  to  Ihe  successful  derivation  of  a 
software  design.  It  serves  to  break  a  problem  down  Into  manageable  and 
minimally  Interacting  components.  Thus,  the  task  Is  reduced  to  one  of  solving 
several  simpler  subproblems.  For  experts,  the  decomposition  and  subproblem 
selection  processes  of  the  design  schema  dictate  the  global  organization  of 
their  design  behavior.  They  first  break  the  problem  Into  Its  major 
constituents,  thus  forming  a  solution  mode).  During  each  Iteration, 
subproblems  from  the  previous  cycle  are  further  decomposed,  most  frequently 
leading  to  a  top-down,  breadth-first  expansion  of  the  solution.  The  Iterative 
process  continues  until  a  solution  Is  I dc;nt I f I ed  for  each  subproblem. 

The  data  show  a  range  of  development  In  the  utilization  of  the 
decomposition  process.  At  least  four  distinct  levels  can  be  distinguished. 

The  first  level  Is  exemplified  by  the  pre-novice,  S25,  who  attempted  to  code 
the  major  steps  of  the  solution  directly  In  FORTRAN.  A  novice  designer  at  the 
next  level  derives  a  solution  model  and  converts  It  Into  a  series  of  steps. 
Novices  who  broke  the  problem  Into  steps  were  usually  able  to  Iterate  over  the 
steps  at  least  once,  producing  a  more  del  cl  led  sequence  of  steps. 

The  more  advanced  novices  are  able  to  break  the  problem  Into  meaningful 
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subprobl ems,  using  their  solution  model  as  a  basis.  S17  Is  able  to  carry  out 
this  first  level  decomposition,  but  he  is  unable  to  recursively  apply  this 
strategy.  S19  Is  able  to  recursively  decompose  the  problem  for  the  first  few 
levels,  but  eventually  he  becomes  so  mired  In  details  that  the  strategy  breaks 
down. 

The  experts  manifest  the  fourth  level  of  development  of  the  decomposition 
processes.  They  exhibit  all  three  major  components  of  the  strategy:  1)  They 
break  the  problem  Into  manageable,  minimally  interacting  parts,  2)  they 
understand  a  problem  before  breaking  It  Into  subproblems,  and  3)  they  retrieve 
a  known  solution.  If  one  exists.  S2  and  S3  depended  almost  completely  on  the 
first  two  of  these,  while  S5  and  S35  were  able  to  retrieve  a  known  solution  to 
a  significant  portion  of  the  problem. 

Experts  devote  a  great  deal  of  effort  to  understanding  a  problem  before 
attempting  to  break  it  Into  subproblems.  They  clarify  constraints  on  the 
problem,  derive  their  Implications,  explore  potential  Interactions,  and  relate 
this  Information  to  real-world  knowledge  about  the  task.  The  novices,  on  the 
other  hand,  show  little  Inclination  to  explore  aspects  of  a  subproblem  before 
proposing  a  solution.  This  has  serious  consequences  for  both  the  correctness 
and  efficiency  of  their  designs. 

Expert  designers  employ  a  set  of  processes  that  attempt  to  find  a  known 
solution  to  a  given  subproblem.  Critical  features  of  the  solution  model  are 
used  to  search  for  potentially  applicable  algorithms.  Successful  retrieval 
requires  the  designer  to  have  knowledge  of  relevant  solutions  and  their 
applicability  conditions,  to  be  able  to  retrieve  the  solution  In  a  possibly 


43 


.A  'w.  S 


novel  context,  and  to  adapt  the  solution  to  the  particular  context  of  the 
design  problem.  The  experts  show  themselves  to  be  skilled  at  retrieving 
algorithms  for  use  In  their  designs.  Novices  show  no  evidence  of  recognizing 
the  applicability  of  Information  In  a  novel  situation  that  they  had 
unquestionably  learned  previously.  The  novices'  schemata  are  deficient  In  the 
processes  that  control  the  retrieval  of  Information  for  Integration  Into  their 
designs. 

The  experts  differed  In  their  ability  to  recall  high-level  solutions  to 
the  problem,  specifically,  for  the  matcher  and  Its  associated  data  structures. 
S35  retrieved  an  algorithm  from  the  literature  and  built  his  solution  around 
It.  S5  retrieved  a  skeletal  solution  to  the  same  subproblems.  However,  he 
chose  to  work  out  this  solution  In  seme  detail  before  proceeding  with  the 
remainder  of  the  design.  S2  and  S3  did  not  retrieve  Information  about 
possible  solutions  to  these  subproblems.  Instead,  they  used  the  default 
decomposition  processes  to  Iteratively  refine  the  problem.  Both,  however, 
recalled  numerous  low-level  algorithms  that  they  Incorporated  Into  their 
designs. 

The  objective  of  the  decomposition  process  Is  to  factor  a  problem  Into 
weakly  Interacting  subproblems.  However,  subproblems  can  Interact,  and  the 

Individual  solutions  must  be  Integrated.  This  can  Impose  serious  coordination 
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demands  upon  the  problem  solver  (Simon,  1973).  The  experts  used  two 
components  of  the  design  schema  to  solve  this  coordination  dilemma.  First, 
experts  expand  subproblems  systematically,  typically  top-down,  breadth-first. 
Second,  they  are  able  to  store  detailed  and  wel l-lntegrated  representations  of 
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previous  problem  solving  activities  and  retrieve  them  when  they  become 
relevant. 


Novices  have  difficulty  coordinating  their  activities  because  of 
Ineffective  retrieval  strategies.  Because  they  do  not  recognize  the 
Implications  of  potential  Interactions,  novices  are  often  unable  to  correctly 
Interface  subproblems.  They  also  fall  to  retrieve  and  Incorporate  Information 
acquired  In  the  classroom  and  are  unable  to  Integrate  Information  generated 
during  earlier  parts  of  the  solution  attempt  with  later  efforts.  Thus,  they 
do  not  generate  a  consistent  and  wel l-Integraicd  solution  to  the  problem. 

The  variations  In  performance,  both  within  and  between  levels  of 
expertise,  demonstrate  the  complexities  of  learning  the  design  schema. 
Basically,  the  schema  Is  learned  through  actual  experience  In  doing  software 
designs;  textbook  knowledge  Is  not  sufficient.  The  experts'  years  of 
experience  enable  the  procedures  of  the  scheme  to  become  automatic,  freeing 
the  designer  to  focus  more  on  the  details  of  the  specific  problem.  As  the  more 
sophisticated  processes  of  the  schema  develop,  the  designer  Is  able  to  deal 
more  successfully  with  complex  problems. 

The  differences  In  the  ability  to  use  the  decomposition  process 
demonstrate  that  the  schema  develops  In  stages.  The  levels  along  this 
continuum  seem  to  correspond  to  Incremental  Improvements  In  a  designer's 
understanding  and  control  of  the  decomposition  process.  Novices  first 
understand  that  the  problem  has  to  be  broken  down  Into  smaller  parts,  although 
they  do  not  have  a  good  understanding  of  the  nature  of  those  parts.  Next  they 
add  the  Idea  that  the  breakdown  should  occur  :feratlvely;  that  Is,  they  should 


go  through  several  cycles  of  breaking  things  down.  At  the  next  level,  they 
acquire  the  ability  to  do  the  decomposition  In  terms  of  meaningful 
subproblems,  and,  finally,  to  recursively  apply  this  strategy.  The  mature 
design  schema  would  Include  at  least  the  following  additional  processes: 
refinement  of  understanding,  retrieval  of  known  solutions,  generation  of 
alternatives,  and  critical  analysis  of  solution  components. 

The  processes  people  use  to  solve  complex  problems  In  their  field  of 
expertise  are  Important  to  the  understanding  of  the  development  of  that  skill. 
In  software  design,  these  processes  appear  to  be  special Ized  versions  of  more 
general  methods,  which  are  highly  organized  and  automatic.  While  these 
processes  superficially  resemble  the  default  methods,  they  are  so  strongly 
tailored  to  the  specific  domain  that  they  should  be  considered  distinct 
methods  In  their  own  right.  For  any  sufficiently  complex  and  wel I- learned 
skill,  these  kinds  of  organizational  structures  would  seem  to  be  necessary.  A 
crucial  question,  which  remains  to  be  addressed.  Is  what  types  of  skills  lend 
themselves  to  the  development  of  such  structures. 
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DESIGN  SCHEMA  RULES:  SELECTION  RULE 


DESIGN  SCHEMA  RULE  1: 

IF  (no  current  subprobUm  exists) 

AND  (any  unsolved  subproblems  on  agenGa) 

THEN  (select  highest  priority  subproblem  or.  If  multiple 
subproblems  at  highest  priority,  select  next 
subproblem  In  breadth-first  order  at  highest 
priority  and  make  It  new  current  subproblem) 


DESIGN  SCHEMA  RULES:  SOLUTION  MODEL  DERIVATION  PROCESS 

DESIGN  SCHEMA  RULE  2: 

IF  (pis  current  subproMem) 

AND  (solution  model  for  p  does  not  exist) 
THEN  (set  goal  to  create  solution  model  for  p) 

DESIGN  SCHEMA  RULE  3: 

IF  (goal  to  create  solution  model  for  p) 

AND  (p  Is  not  well  understood) 

THEN  (retrieve  Information  relevant  to  p  and  refine 
understanding  of  p) 

AND  (add  new  subproblem  p*  to  agenda) 

AND  (make  p*  current  rubproblem) 

DESIGN  SCHEMA  RULE  4: 

IF  (goal  to  create  solution  model  for  p) 

AND  (p  Is  undersiood  <>s  "trivial") 

THEN  (assert  that  p  Is  solved) 

AND  (delete  p  as  current  subproblem) 

DESIGN  SCHEMA  RULE  5: 

IF  (goal  to  create  solution  model  for  p) 

AND  (p  Is  understood  as  "complex") 

THEN  (define  solution  model  fcr  p) 


Figure  1.  A  production  system  representation  of  the  design 
schema  control  processes. 


DESIGN  SCHEMA  RULES:  SOLUTION  RETRIEVAL  PROCESS 


DESIGN  SCHEMA  RULE  6: 

IF  (solution  model  for  p  exists) 

THEN  (search  memory  for  potential  solutions  which 
match  critical  features  of  solution  model 
for  p) 

DESIGN  SCHEMA  RULE  7: 

IF  (potential  solution  s  to  problem  p  Is  found) 

THEN  (evaluate  applicability  of  s) 

DESIGN  SCHEMA  RULE  8: 

IF  (potential  solution  s  to  problem  p  Is  highly 
appl Icabl e) 

THEN  (assert  that  p  Is  solved) 

AND  (delete  p  as  current  subproblem) 

DESIGN  SCHEMA  RULE  9: 

IF  (potential  solution  s  to  problem  p  Is  moderately 
appl Icabl e) 

THEN  (add  to  agenda  new  subproblem  p*  created  from 
solution  model  for  p  augmented  by  s) 

AND  (make  p1  current  subproblem) 

DESIGN  SCHEMA  RULE  10: 

IF  (potential  solution  s  Is  weakly  applicable) 

THEN  (reject  potential  solution  s) 


DESIGN  SCHEMA  RULES:  REFINE  SOLUTION  MODEL  DECOMPOSITION 
DESIGN  SCHEMA  RULE  11: 

IF  (no  potential  solution  to  problem  p  Is  found) 

THEN  (expand  solution  model  for  p  Into  well-defined 

subproblems  using  understanding  and  evaluation 
processes  as  needed) 

AND  (add  each  new  subproblem  generated 
to  agenda) 


Figure  1.  (Continued) 
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PAGE-KEYED  INDEXING  SYSTEM 


BACKGROUND.  A  book  publisher  requires  a  system  to  produce  a  page-keyed 
Index.  This  system  will  accept  as  Input  the  source  text  of  a  book  and  produce 
as  output  a  I  1st  of  specified  Index  terms  and  the  page  numbers  on  which  each 
Index  term  appears.  This  system  Is  to  operate  In  a  batch  mode. 

DESIGN  TASK.  You  are  to  design  a  system  to  produce  a  page-keyed  Index. 
The  source  file  for  each  book  to  be  Indexed  Is  an  ASCII  file  residing  on  disk. 
Page  numbers  will  be  Indicated  on  a  line  In  the  form  /»NNNN,  where  t*  are 
marker  characters  used  to  Identify  the  occurrence  of  page  numbers  and  NNNN  Is 
the  page  number. 

The  page  number  will  appear  after  a  block  of  text  that  comprises  the  body 
of  the  page.  Normally,  a  page  contains  enough  Information  to  fill  an  8  1/2  x 
11  Inch  page.  Words  are  delimited  by  the  following  characters:  space,  period, 
comma,  semi-colon,  colon,  carriage-return,  question  mark,  quote,  double  quote, 
exclamation  point,  and  I  I ne- feed.  Words  at  the  end  of  a  line  may  be 
hyphenated  and  continued  on  the  following  line,  but  words  will  not  be 
continued  across  page  boundaries. 

A  term  file,  containing  a  list  of  terms  to  be  Indexed,  will  be  read  from 
a  card  reader.  The  term  file  contains  one  term  per  line,  where  a  term  Is  1  to 
5  words  long. 

The  system  should  read  the  source  file  and  term  file  and  find  all 
occurrences  of  each  term  to  be  Indexed.  The  output  should  contain  the  Index 
terms  listed  alphabetically  with  the  page  numbers  following  each  term  In 
alphabetical  order. 


Figure  2.  The  text  of  the  page-keyed  indexer  problem. 
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